Conserved taxonomy assignment

Methods:

  • Phylotype method at genus level.
  • OTU-based method with minimum set at 97% identity. The opticlust algorithm was used to optimize the quality of assigned taxonomy.
  • Phylogeny method where tree was taxonomically classified with minimum set at 97% identity


Quality of conserved taxonomy

The opticlust algorithm yielded high quality OTUs based on the high precision metrics below.


Table x: Statistical parameters calculated from the OTU-based classification method
Parameter Value
cutoff 0.2
Sensitivity 0.998
Specificity 0.999
PPV 0.998
NPV 0.999
Accuracy 0.999
MCC 0.997
F1score 0.998
FDR 0.002

Taxonomy assigned to observed OTUs

The taxonomy assignment is based on the phylotype method in mothur platform. Only taxon observed more than once are displayed.


Manual annotation


Figure x: Unfiltered and curated OTU abundance. Visual representation of taxon terms highlight the most abundant taxon based on frequency of being assigned to an OTU or tree nodes. Muribaculaceae is the most frequently assigned family and Muribaculaceae_ge is the most frequent species assigned to most sequences.




Most abundant species (top 15%)


OTUs
 [1] "Otu01" "Otu02" "Otu03" "Otu04" "Otu05" "Otu06" "Otu08" "Otu09"
 [9] "Otu10" "Otu18"

Phylum
[1] "Bacteroidetes" "Firmicutes"   

Class
[1] "Bacteroidia" "Clostridia" 

Order
[1] "Bacteroidales"   "Clostridiales"   "Lactobacillales"

Family
[1] "Bacteroidaceae"  "Lachnospiraceae" "Muribaculaceae"  "Ruminococcaceae"

Genus
[1] "Alistipes"          "Bacteroides"        "Lactobacillus"     
[4] "Mollicutes_RF39_ge" "Muribaculaceae_ge"  "Oscillospira"      
[7] "Turicibacter"      


Example of rank abundance


  • Demonstration using eight samples
  • Rank abundance curves can be used to display species richness and species evenness across selected samples.

Figure x. Rank abundance of eight samples. Package: goeveg




Correlation between observed phyla


Figure x. Correlation between species identified at phylum-level. Species are ordered alphabetically (top panel) and heuristically (bottom panel)




Alpha diversity


Species richness at each datapoint

Figure x: Stacked barplots for species richness. The estimated richness (green bars) was calculated using chao calculator and observed ichness (red bars) was calculated using sobs.



Species richness: Boxplots, density plots and histograms

Figure x: Species richness (observed species) displayed by boxplot (A), density plots (B) and histograms (C).



Species richness plots

Include are:

  • Line plots: Points joined by straight lines
  • Scatter plot with correlation coefficient
  • Jitter boxplots




Species accumulation




Individual species accumulation curves using different methods


Overplot graph of species accumulation curves


Rarefaction and Extrapolation (R/E)



Beta diversity


Heatmaps


Phyla


Class


Order


Family


Genus


PAM clustering



Number of best clusters


OTUbased
Number_clusters     Value_Index 
         3.0000          1.9922 
Phylum
Number_clusters     Value_Index 
         3.0000          4.9398 
Class
Number_clusters     Value_Index 
         3.0000          4.2837 
Order
Number_clusters     Value_Index 
          3.000           3.667 
Family
Number_clusters     Value_Index 
         2.0000          3.0988 
Genus
Number_clusters     Value_Index 
         3.0000          2.3948 



PCoA or PCO: Principal Coordinates Analysis




Point plots: PCoA Eigen Values


NMDS (Non-metric multidimensional scaling)


Square root transformation
Wisconsin double standardization
Run 0 stress 0.02311505 
Run 1 stress 0.02438405 
Run 2 stress 0.02311359 
... New best solution
... Procrustes: rmse 0.0005183599  max resid 0.0007295316 
... Similar to previous best
Run 3 stress 0.0004584523 
... New best solution
... Procrustes: rmse 0.1483403  max resid 0.2621175 
Run 4 stress 9.860654e-05 
... New best solution
... Procrustes: rmse 0.04889454  max resid 0.07544724 
Run 5 stress 0.02226818 
Run 6 stress 9.69273e-05 
... New best solution
... Procrustes: rmse 0.001965716  max resid 0.003977862 
... Similar to previous best
Run 7 stress 9.999656e-05 
... Procrustes: rmse 0.08246718  max resid 0.1334889 
Run 8 stress 0.02226872 
Run 9 stress 0.02438409 
Run 10 stress 0.0004451715 
... Procrustes: rmse 0.1210363  max resid 0.1955044 
*** Solution reached
Square root transformation
Wisconsin double standardization
Run 0 stress 5.792227e-06 
Run 1 stress 0.09233043 
Run 2 stress 0.2004274 
Run 3 stress 9.882603e-05 
... Procrustes: rmse 0.05899969  max resid 0.1129256 
Run 4 stress 9.81698e-05 
... Procrustes: rmse 0.05810476  max resid 0.08380598 
Run 5 stress 0.06988752 
Run 6 stress 9.901129e-05 
... Procrustes: rmse 0.08357369  max resid 0.1174309 
Run 7 stress 9.92299e-05 
... Procrustes: rmse 0.07699285  max resid 0.122011 
Run 8 stress 9.975641e-05 
... Procrustes: rmse 0.13685  max resid 0.202801 
Run 9 stress 0.1448684 
Run 10 stress 9.132325e-05 
... Procrustes: rmse 0.04390869  max resid 0.07203059 
Run 11 stress 0.0001132911 
... Procrustes: rmse 0.1273508  max resid 0.1978526 
Run 12 stress 9.786936e-05 
... Procrustes: rmse 0.1216092  max resid 0.1935745 
Run 13 stress 9.26535e-05 
... Procrustes: rmse 0.05651613  max resid 0.1103798 
Run 14 stress 0.06988741 
Run 15 stress 9.423633e-05 
... Procrustes: rmse 0.1405515  max resid 0.2836626 
Run 16 stress 8.171376e-05 
... Procrustes: rmse 0.0624041  max resid 0.09980293 
Run 17 stress 9.507963e-05 
... Procrustes: rmse 0.05671056  max resid 0.08367146 
Run 18 stress 9.942293e-05 
... Procrustes: rmse 0.05752394  max resid 0.09767716 
Run 19 stress 0.0003890592 
... Procrustes: rmse 0.1387151  max resid 0.208923 
Run 20 stress 9.977772e-05 
... Procrustes: rmse 0.1185833  max resid 0.1793281 
*** No convergence -- monoMDS stopping criteria:
     2: no. of iterations >= maxit
    13: stress < smin
     5: stress ratio > sratmax
Square root transformation
Wisconsin double standardization
Run 0 stress 0.001418859 
Run 1 stress 0.00141893 
... Procrustes: rmse 2.590167e-05  max resid 3.969124e-05 
... Similar to previous best
Run 2 stress 0.05061396 
Run 3 stress 0.001418881 
... Procrustes: rmse 4.112065e-05  max resid 6.294641e-05 
... Similar to previous best
Run 4 stress 0.1108941 
Run 5 stress 0.001418859 
... New best solution
... Procrustes: rmse 1.216135e-06  max resid 1.548384e-06 
... Similar to previous best
Run 6 stress 0.1513699 
Run 7 stress 0.001418878 
... Procrustes: rmse 1.158292e-05  max resid 1.778304e-05 
... Similar to previous best
Run 8 stress 0.1727749 
Run 9 stress 0.001418908 
... Procrustes: rmse 2.245688e-05  max resid 3.426386e-05 
... Similar to previous best
Run 10 stress 0.1319101 
*** Solution reached
Square root transformation
Wisconsin double standardization
Run 0 stress 0.02218756 
Run 1 stress 0.02218377 
... New best solution
... Procrustes: rmse 0.0009434361  max resid 0.001323574 
... Similar to previous best
Run 2 stress 0.02218658 
... Procrustes: rmse 0.0007002663  max resid 0.0009812776 
... Similar to previous best
Run 3 stress 0.02217476 
... New best solution
... Procrustes: rmse 0.003355352  max resid 0.004665514 
... Similar to previous best
Run 4 stress 0.02223982 
... Procrustes: rmse 0.01210337  max resid 0.01706141 
Run 5 stress 0.02217595 
... Procrustes: rmse 0.002522356  max resid 0.003498614 
... Similar to previous best
Run 6 stress 0.0221821 
... Procrustes: rmse 0.002897763  max resid 0.00402172 
... Similar to previous best
Run 7 stress 0.08213731 
Run 8 stress 0.02226069 
... Procrustes: rmse 0.01428661  max resid 0.01999361 
Run 9 stress 0.08244629 
Run 10 stress 0.02217762 
... Procrustes: rmse 0.001403351  max resid 0.001944862 
... Similar to previous best
*** Solution reached
Square root transformation
Wisconsin double standardization
Run 0 stress 0.0005606845 
Run 1 stress 0.01736873 
Run 2 stress 0.005096509 
Run 3 stress 0.0008738854 
... Procrustes: rmse 0.1006689  max resid 0.1706016 
Run 4 stress 0.00138802 
Run 5 stress 0.0001893003 
... New best solution
... Procrustes: rmse 0.03750972  max resid 0.05272904 
Run 6 stress 0.01750833 
Run 7 stress 0.01747092 
Run 8 stress 0.01832696 
Run 9 stress 0.01832654 
Run 10 stress 0.001450898 
Run 11 stress 0.01832641 
Run 12 stress 8.579538e-05 
... New best solution
... Procrustes: rmse 0.01696292  max resid 0.0327858 
Run 13 stress 0.002567052 
Run 14 stress 0.01768765 
Run 15 stress 0.002013325 
Run 16 stress 0.01756597 
Run 17 stress 0.01832713 
Run 18 stress 0.00161634 
Run 19 stress 0.01520625 
Run 20 stress 0.01832661 
*** No convergence -- monoMDS stopping criteria:
    13: no. of iterations >= maxit
     1: stress < smin
     6: stress ratio > sratmax
Square root transformation
Wisconsin double standardization
Run 0 stress 0.00644259 
Run 1 stress 0.001327398 
... New best solution
... Procrustes: rmse 0.1068776  max resid 0.1493299 
Run 2 stress 0.000654329 
... New best solution
... Procrustes: rmse 0.06375063  max resid 0.1025383 
Run 3 stress 0.01031367 
Run 4 stress 0.002635087 
Run 5 stress 0.005949759 
Run 6 stress 0.003828015 
Run 7 stress 9.904256e-05 
... New best solution
... Procrustes: rmse 0.1479827  max resid 0.2950912 
Run 8 stress 0.0004453573 
... Procrustes: rmse 0.0631433  max resid 0.09089009 
Run 9 stress 0.000133494 
... Procrustes: rmse 0.02749027  max resid 0.03915047 
Run 10 stress 9.777086e-05 
... New best solution
... Procrustes: rmse 0.1450925  max resid 0.2926526 
Run 11 stress 0.0001565034 
... Procrustes: rmse 0.1435442  max resid 0.2392662 
Run 12 stress 0.001407042 
Run 13 stress 0.0002049246 
... Procrustes: rmse 0.03291418  max resid 0.05021331 
Run 14 stress 0.0001726277 
... Procrustes: rmse 0.1432306  max resid 0.2399713 
Run 15 stress 0.002818962 
Run 16 stress 0.001279359 
Run 17 stress 0.003963784 
Run 18 stress 0.000833141 
Run 19 stress 0.009254673 
Run 20 stress 0.009023119 
*** No convergence -- monoMDS stopping criteria:
    18: no. of iterations >= maxit
     2: stress < smin


Parameters used for the NMDS


OTUs
----------------------------

Call:
metaMDS(comm = otu.t, distance = "bray", k = 3, try = 10, display = c("sites"),      choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(otu.t)) 
Distance: bray 

Dimensions: 3 
Stress:     9.69273e-05 
Stress type 1, weak ties
Two convergent solutions found after 10 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(otu.t))' 

Phylum
----------------------------

Call:
metaMDS(comm = phylum.t, distance = "bray", k = 3, try = 10,      display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(phylum.t)) 
Distance: bray 

Dimensions: 3 
Stress:     5.792227e-06 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(phylum.t))' 

Class
----------------------------

Call:
metaMDS(comm = class.t, distance = "bray", k = 3, try = 10, display = c("sites"),      choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(class.t)) 
Distance: bray 

Dimensions: 3 
Stress:     0.001418859 
Stress type 1, weak ties
Two convergent solutions found after 10 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(class.t))' 

Order
----------------------------

Call:
metaMDS(comm = order.t, distance = "bray", k = 3, try = 10, display = c("sites"),      choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(order.t)) 
Distance: bray 

Dimensions: 3 
Stress:     0.02217476 
Stress type 1, weak ties
Two convergent solutions found after 10 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(order.t))' 

Family
----------------------------

Call:
metaMDS(comm = family.t, distance = "bray", k = 3, try = 10,      display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(family.t)) 
Distance: bray 

Dimensions: 3 
Stress:     8.579538e-05 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(family.t))' 

Genus
----------------------------

Call:
metaMDS(comm = genus.t, distance = "bray", k = 3, try = 10, display = c("sites"),      choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(genus.t)) 
Distance: bray 

Dimensions: 3 
Stress:     9.777086e-05 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(genus.t))' 


Sherperd and non-metric multidimensional scaling plots.



Figure X. Sherperd and non-metric multidimensional scaling plot. Green oints represent samples and red points represent OTU or species. Similar samples are ordinated together. Stress values are shown at the botthom of ordination plot.



Phylogenetic clustering and tree annotation


Figure x: The circular phylograms (A), unrooted cladogram (B), and the rectangular phylograms (C) display the relationships of the 360 samples used in the case study. Female (red) and male (blue) linked with sequence counts showing the proportion of the number of classified (green) and unclassified (red) displayed on a pie chart followed by the phyla abundance (heatmap) and species richness bar chart showing the observed (green) and estimated (maroon) richness. A portion of the tree (D) is enlarged to show some details.

Posible questions


Alpha diversity

  • QN1: Are the values obtained too sensitive to sampling?
  • QN2: Was the sampling effort sufficient to account for most OTUs present in a sample?
  • QN3: Is there a need to continue with re-sampling?
  • QN4: …….?


Beta diversity

  • QN1: …….?
  • QN2: …….?
  • QN3: …….?
  • QN4: …….?


More intervention by investigators


Summary of packages used in the analysis

R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] NbClust_3.0     iNEXT_2.0.19    goeveg_0.4.2    vegan_2.5-4    
 [5] permute_0.9-4   scales_1.0.0    ggpubr_0.2      magrittr_1.5   
 [9] dplyr_0.8.0.1   reshape2_1.4.3  funModeling_1.7 Hmisc_4.2-0    
[13] ggplot2_3.1.0   Formula_1.2-3   survival_2.43-3 lattice_0.20-38

loaded via a namespace (and not attached):
 [1] maps_3.3.0          splines_3.5.2       dotCall64_1.0-0    
 [4] gtools_3.8.1        moments_0.14        assertthat_0.2.1   
 [7] latticeExtra_0.6-28 pander_0.6.3        yaml_2.2.0         
[10] slam_0.1-45         corrplot_0.84       pillar_1.3.1       
[13] backports_1.1.3     glue_1.3.1          digest_0.6.18      
[16] RColorBrewer_1.1-2  checkmate_1.9.1     colorspace_1.4-1   
[19] cowplot_0.9.4       htmltools_0.3.6     Matrix_1.2-15      
[22] plyr_1.8.4          tm_0.7-6            pkgconfig_2.0.2    
[25] purrr_0.3.2         gdata_2.18.0        htmlTable_1.13.1   
[28] tibble_2.1.1        mgcv_1.8-27         withr_2.1.2        
[31] ROCR_1.0-7          nnet_7.3-12         lazyeval_0.2.1     
[34] NLP_0.2-0           crayon_1.3.4        evaluate_0.13      
[37] nlme_3.1-137        MASS_7.3-51.1       gplots_3.0.1.1     
[40] xml2_1.2.0          foreign_0.8-71      tools_3.5.2        
[43] data.table_1.12.0   hms_0.4.2           stringr_1.4.0      
[46] munsell_0.5.0       cluster_2.0.7-1     entropy_1.2.1      
[49] compiler_3.5.2      caTools_1.17.1.1    rlang_0.3.4        
[52] grid_3.5.2          rstudioapi_0.10     spam_2.2-1         
[55] htmlwidgets_1.3     bitops_1.0-6        base64enc_0.1-3    
[58] labeling_0.3        rmarkdown_1.12      gtable_0.2.0       
[61] R6_2.4.0            gridExtra_2.3       knitr_1.22         
[64] readr_1.3.1         KernSmooth_2.23-15  stringi_1.4.3      
[67] parallel_3.5.2      Rcpp_1.0.1          fields_9.6         
[70] rpart_4.1-13        acepack_1.4.1       tidyselect_0.2.5   
[73] xfun_0.6